Kimi K2 Thinking

About the Provider

Moonshot AI is a Chinese AI research company focused on building large-scale foundation models with advanced agentic and multimodal capabilities. Kimi K2 Thinking is their flagship open-weights reasoning model, the first open-source model to outperform leading closed-source models including GPT-5 and Claude 4.5 Sonnet across major benchmarks.

Model Quickstart

This section helps you quickly get started with the moonshotai/Kimi-K2-Thinking model on the Qubrid AI inferencing platform. To use this model, you need:

A valid Qubrid API key
Access to the Qubrid inference API
Basic knowledge of making API requests in your preferred language

Once authenticated with your API key, you can send inference requests to the moonshotai/Kimi-K2-Thinking model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
    base_url="https://platform.qubrid.com/v1",
    api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
    model="moonshotai/Kimi-K2-Thinking",
    messages=[
      {
        "role": "user",
        "content": "Explain quantum computing in simple terms"
      }
    ],
    max_tokens=16384,
    temperature=1,
    top_p=0.95,
    stream=True
)

# If stream = False comment this out
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

Model Overview

Kimi K2 Thinking is the first open-weights model to achieve SOTA performance against leading closed-source models including GPT-5 and Claude 4.5 Sonnet across major benchmarks — HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%).

Built on a 1T parameter MoE architecture with 32B active per token and native INT4 quantization via QAT, it runs at 2x the speed of FP8 deployments.
The model maintains stable tool-use across 200–300 sequential calls within a 256K context window, with interleaved chain-of-thought and dynamic tool calling for complex agentic workflows.

Model at a Glance

Feature	Details
Model ID	`moonshotai/Kimi-K2-Thinking`
Provider	Moonshot AI
Architecture	Sparse MoE Transformer — 1T total / 32B active per token, native INT4 via Quantization-Aware Training
Model Size	1T Total / 32B Active
Parameters	4
Context Length	256K Tokens
Release Date	2025
License	Apache 2.0
Training Data	Large-scale multilingual dataset with RL post-training for agentic reasoning and tool-use

When to use?

You should consider using Kimi K2 Thinking if:

You need complex agentic research workflows with multi-step tool orchestration
Your application requires long-horizon coding and debugging
You are solving advanced mathematical reasoning tasks
Your use case involves autonomous writing and analysis
You need a model that outperforms GPT-5 and Claude 4.5 Sonnet on open benchmarks
Your workflow requires stable tool use across 200–300 sequential calls

Inference Parameters

Parameter Name	Type	Default	Description
Streaming	boolean	true	Enable streaming responses for real-time output.
Temperature	number	1	Recommended temperature is 1.0 for Kimi-K2-Thinking.
Max Tokens	number	16384	Maximum number of tokens to generate.
Top P	number	0.95	Controls nucleus sampling.

Key Features

First Open-Source to Beat Closed Frontier Models: Achieves SOTA on HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%) — surpassing GPT-5 and Claude 4.5 Sonnet.
Native INT4 via QAT: Quantization-Aware Training enables INT4 inference at 2x the speed of FP8 without accuracy loss.
Stable Long-Horizon Tool Use: Maintains consistent tool-calling behaviour across 200–300 sequential calls within a single context window.
Interleaved Chain-of-Thought: Dynamically interleaves reasoning traces with tool calls for interpretable agentic execution.
1T MoE Architecture: Frontier-scale capacity with only 32B parameters active per token for efficient inference.
256K Context Window: Supports long-horizon document analysis, multi-turn agentic tasks, and extended reasoning chains.

Summary

Kimi K2 Thinking is Moonshot AI’s flagship open-weights reasoning model and the first to surpass closed frontier models at open-source scale.

It uses a 1T MoE Transformer with 32B active parameters and native INT4 via QAT, running at 2x the speed of FP8 deployments.
It achieves SOTA on HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%), outperforming GPT-5 and Claude 4.5 Sonnet.
The model supports 256K context, stable 200–300 sequential tool calls, and interleaved chain-of-thought reasoning.
Licensed under Apache 2.0 for full commercial use.

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Summary

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

Documentation Index

​About the Provider

​Model Quickstart

​Model Overview

​Model at a Glance

​When to use?

​Inference Parameters

​Key Features

​Summary

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Summary